Experiments with Unit Selection Speech Databases for Indian Languages
نویسندگان
چکیده
This paper presents a brief overview of unit selection speech synthesis and discuss the issues relevant to the development of voices for Indian languages. We discuss a few perceptual experiments conducted on Hindi and Telugu voices. 1 Role of Language Technologies Most of the Information in digital world is accessible to a few who can read or understand a particular language. Language technologies can provide solutions in the form of natural interfaces so that digital content can reach to the masses and facilitate the exchange of information across different people speaking different languages. These technologies play a crucial role in multi-lingual societies such as India which has about 1652 dialects/native languages. While Hindi written in Devanagari script, is the official language, the other 17 languages recognized by the constitution of India are: 1) Assamese 2) Tamil 3) Malayalam 4) Gujarati 5) Telugu 6) Oriya 7) Urdu 8) Bengali 9) Sanskrit 10) Kashmiri 11) Sindhi 12) Punjabi 13) Konkani 14) Marathi 15) Manipuri 16) Kannada and 17) Nepali. Seamless integration of speech recognition, machine translation and speech synthesis systems could facilitate the exchange of information between two people speaking two different languages. Our overall goal is to develop speech recognition and speech synthesis systems for most of these languages. In this paper we discuss the issues related to the development of speech synthesis systems for Indian languages using unit selection techniques. This work is done within the FestVox voice building framework [1], which offers general tools for building unit selection synthesizers in new languages. FestVox offers a language independent method for building synthetic voices, offering mechanisms to abstractly describe phonetic and syllabic structure in the language. It is that flexibility in the language building process that we exploited to build voices for Indian languages. Voices generated by this system may be run in the Festival Speech Synthesis System [2]. 2 Speech Synthesis Systems The objective of a text to speech system is to convert an arbitrary given text into a corresponding spoken waveform. Text processing and speech generation are two main components of a text to speech system. The objective of the text processing component is to process the given input text and produce appropriate sequence of phonemic units. These phonemic units are realized by the speech generation component either by synthesis from parameters or by selection of a unit from a large speech corpus. For natural sounding speech synthesis, it is essential that the text processing component produce an appropriate sequence of phonemic units corresponding to an arbitrary input text. 3 Text Processing Front End Before we discuss the issues related to text processing, let us briefly understand the nature of the text for which the synthesis systems are built. 3.1 Nature of the Scripts of Indian Languages The basic units of the writing system in Indian languages are Aksharas, which are an orthographic representation of speech sounds. An Akshara in Indian language scripts is close to a syllable and can be typically of the following form: C, V, CV, CCV, VC and CVC where C is a consonant and V is a vowel. All Indian language scripts have a common phonetic base, and an universal phoneset consists of about 35 consonants and about 18 vowels. The pronunciation of these scripts is almost straight
منابع مشابه
Recent Advances of Speech Databases Development Activity for Indian Languages
Development of Speech Corpora and acoustic–phonetic data bases are indispensable for any research and development work in spoken language systems. Systematic efforts have been made to create speech databases for some major languages of India. The paper attempts to present the status and the recent advancements made in corpora development for some of the Indian languages. Different types of data...
متن کاملAutomatic pruning of unit selection speech databases for synthesis without loss of naturalness
In the paper we present our experiments with automatic pruning of speech databases created by us for Unit Selection based speech synthesis systems. Several algorithms have been attempted and perceptually evaluated. An optimal size of speech database has been reached where lose of naturalness due to unit pruning is not perceptible.
متن کاملThe IIIT-H Indic Speech Databases
This paper discusses the efforts in collecting speech databases for Indian languages – Bengali, Hindi, Kannada, Malayalam, Marathi, Tamil and Telugu. We discuss relevant design considerations in collecting these databases, and demonstrate their usage in speech synthesis. By releasing these speech databases in the public domain without any restrictions for non commercial and commercial purposes,...
متن کاملStudy on Unit-Selection and Statistical Parametric Speech Synthesis Techniques
One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...
متن کاملThe USTC System for Blizzard Challenge 2013
This paper introduces the speech synthesis system developed by USTC for Blizzard Challenge 2013. There are two evaluation tasks in this year: the English audiobook tasks and the pilot tasks on 4 Indian languages. According to the various amount of training data, different speech synthesis systems are constructed. The hidden Markov model (HMM) based unit selection and waveform concatenation syst...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003